We analyzed 39 autoimmune disease- and trait-associated SNP sets, obtained from the Supplemental table 1 of the Farh, K. K.-H., Marson, A., Zhu, J., Kleinewietfeld, M., Housley, W. J., Beik, S., … Bernstein, B. E. (2014). “Genetic and epigenetic fine mapping of causal autoimmune disease variants”“ Nature. doi:10.1038/nature13835.

First, we re-created the heatmap of shared genetic features among the autoimmune diseases and traits, that is, counts of genomic elements overlapping between pairs of terms. We will use this heatmap as a reference point to compare with the heatmaps produced by the regulatory similarity analysis.

Analysis of all regulatory datasets

Although we used 4,498 regulatory datasets from the ENCODE project processed with the use with GenomeRunner, some regulatory datasets show not statistically significant enrichments in any of the 39 SNP sets. We removed these datasets as non-informative, and kept the remaining 2,969 regulatory datasets.

## [1] 4498   39
## [1] 2969   39

We visualized the matrix of pair-wise Spearman correlation coefficients among the term-specific regulatory enrichment profiles.

We then compared how regulatory similarity correlates with shared genomic features similarity. Spearman correlation coefficient between the two is:

## [1] 0.395182

The top 10 pairs of disease- aassociated SNPs are most similar with each other. The correlation coefficient shows Spearman correlation coefficient among the regulatory enrichment profiles for each term-specific SNP set.

## 
## --------------------------------------------------------------------------------------------
##                   Disease 1                            Disease 2          Corr. coefficient 
## ---------------------------------------------- ------------------------- -------------------
##                HDL_cholesterol                       Triglycerides              0.473       
## 
##                LDL_cholesterol                       Triglycerides             0.4314       
## 
##             Chronic_kidney_disease                   Urate_levels              0.3742       
## 
##                HDL_cholesterol                      LDL_cholesterol            0.3475       
## 
##              Bone_mineral_density                   Type_2_diabetes            0.3225       
## 
##               Multiple_sclerosis               Primary_biliary_cirrhosis        0.316       
## 
##              Alzheimers_combined                    Type_2_diabetes            0.2999       
## 
## Liver_enzyme_levels_gamma_glutamyl_transferase       Urate_levels              0.2976       
## 
##         Fasting_glucose_related_traits              Type_2_diabetes            0.2972       
## 
## Liver_enzyme_levels_gamma_glutamyl_transferase      Platelet_counts            0.2944       
## --------------------------------------------------------------------------------------------

The regulatory similarity dendrogram can be divided into four separate clusters:

## Cluster01 has  14 members 
## Platelet_counts
## Liver_enzyme_levels_gamma_glutamyl_transferase
## Red_blood_cell_traits
## LDL_cholesterol
## HDL_cholesterol
## Triglycerides
## Type_2_diabetes
## Fasting_glucose_related_traits
## Bone_mineral_density
## Alzheimers_combined
## Creatinine_levels
## Renal_function_related_traits_BUN
## Urate_levels
## Chronic_kidney_disease
##  
## Cluster02 has   9 members 
## Multiple_sclerosis
## Kawasaki_disease
## Celiac_disease
## Systemic_lupus_erythematosus
## Psoriasis
## Ulcerative_colitis
## Rheumatoid_arthritis
## Crohns_disease
## Autoimmune_thyroiditis
##  
## Cluster03 has   5 members 
## Primary_biliary_cirrhosis
## Ankylosing_spondylitis
## Systemic_sclerosis
## Migraine
## Primary_sclerosing_cholangitis
##  
## Cluster04 has  11 members 
## Juvenile_idiopathic_arthritis
## Atopic_dermatitis
## Alopecia_areata
## C_reactive_protein
## Allergy
## Type_1_diabetes
## Vitiligo
## Behcets_disease
## Progressive_supranuclear_palsy
## Restless_legs_syndrome
## Asthma
## 

We estimated the differences in regulatory associations of term-secific SNP sets.

The first column shows names of regulatory datasets. The following two columns show the average p-values of the cluster-specific SNP sets-regulatory associations. The smaller a p-value is, the more SNPs in a cluster enriched in corresponsing regulatory dataset. A “-” sign indicates that an association is underrepresented (depleted). The “adj.P.Val” column shows whether a difference in the associations between the clusters is statistically significantly different. The last column shows descriptions of the regulatory datasets. The tables were sorted by “adj.P.Val” column; the top 10 or less most significantly different associations are shown.

## [1] "c1 vs. c2 , number of degs significant at adj.p.val<0.5: 116"
## 
## ----------------------------------------------------------------------------------------------------------------
##                     Row.names                          c1        c2      adj.P.Val               V2             
## -------------------------------------------------- ---------- --------- ----------- ----------------------------
##  wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1   -0.0696   0.0001995  1.664e-05     GM12878 NFIC v042211.1   
##                                                                                      ChIP-seq Peaks Rep 1 from  
##                                                                                             ENCODE/HAIB         
## 
##  wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1    -0.1765   0.002657   1.664e-05    GM12878 FOXM1 v042211.1   
##                                                                                      ChIP-seq Peaks Rep 1 from  
##                                                                                             ENCODE/HAIB         
## 
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2  -0.2057    0.00165   1.664e-05    GM12878 RUNX3 v042211.1   
##                                                                                      ChIP-seq Peaks Rep 2 from  
##                                                                                             ENCODE/HAIB         
## 
##          wgEncodeOpenChromFaireGm12892Pk            -0.2289   0.001429   2.372e-05    GM12892 FAIRE Peaks from  
##                                                                                        ENCODE/OpenChrom(UNC)    
## 
##  wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2    -0.1244   0.0004603  2.849e-05     GM12878 PML v042211.1    
##                                                                                      ChIP-seq Peaks Rep 2 from  
##                                                                                             ENCODE/HAIB         
## 
##  wgEncodeHaibTfbsGm12878Mta3sc81325V0422111PkRep2   -0.04666  6.729e-05  3.617e-05     GM12878 MTA3 v042211.1   
##                                                                                      ChIP-seq Peaks Rep 2 from  
##                                                                                             ENCODE/HAIB         
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2  -0.2152   0.001387   3.617e-05    GM12878 STAT5A v042211.1  
##                                                                                      ChIP-seq Peaks Rep 2 from  
##                                                                                             ENCODE/HAIB         
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1   -0.1962   0.002065   3.617e-05     GM12878 ATF2 v042211.1   
##                                                                                      ChIP-seq Peaks Rep 1 from  
##                                                                                             ENCODE/HAIB         
## 
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep1  -0.1173   0.001776   4.267e-05    GM12878 RUNX3 v042211.1   
##                                                                                      ChIP-seq Peaks Rep 1 from  
##                                                                                             ENCODE/HAIB         
## 
##      wgEncodeBroadHistoneGm12878H3k9me3StdPk       -2.144e-06 7.237e-08  4.636e-05  GM12878 H3K9me3 Histone Mods
##                                                                                        by ChIP-seq Peaks from   
##                                                                                             ENCODE/Broad        
## ----------------------------------------------------------------------------------------------------------------
## 
## [1] "c1 vs. c3 , number of degs significant at adj.p.val<0.5: 1"
## 
## ---------------------------------------------------------------------------------------------------
##                Row.names                   c1       c3      adj.P.Val               V2             
## ---------------------------------------- ------- --------- ----------- ----------------------------
## wgEncodeBroadHistoneMonocd14ro1746CtcfPk -0.6355 4.879e-06   0.01794   Monocytes CD14+ CTCF Histone
##                                                                        Mods by ChIP-seq Peaks from 
##                                                                                ENCODE/Broad        
## ---------------------------------------------------------------------------------------------------
## 
## [1] "c2 vs. c4 , number of degs significant at adj.p.val<0.5: 76"
## 
## ----------------------------------------------------------------------------------------------------------
##                     Row.names                         c2       c4     adj.P.Val             V2            
## -------------------------------------------------- --------- ------- ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2  0.00165  0.8762   0.005057    GM12878 RUNX3 v042211.1 
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1  0.0001995 0.8082   0.005057    GM12878 NFIC v042211.1  
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##          wgEncodeOpenChromFaireGm12892Pk           0.001429  0.8088   0.005057   GM12892 FAIRE Peaks from 
##                                                                                    ENCODE/OpenChrom(UNC)  
## 
##  wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1   0.002657  0.8098   0.005057    GM12878 FOXM1 v042211.1 
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2   0.0004603 0.8089   0.005801     GM12878 PML v042211.1  
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep1 0.007786  -0.9993   0.00689   GM12878 STAT5A v042211.1 
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Mta3sc81325V0422111PkRep2  6.729e-05 -0.9847   0.00689    GM12878 MTA3 v042211.1  
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.001387  0.6498    0.00689   GM12878 STAT5A v042211.1 
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1  0.002065  0.7675    0.00689    GM12878 ATF2 v042211.1  
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2  0.007315  0.8856    0.00899    GM12878 ATF2 v042211.1  
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## ----------------------------------------------------------------------------------------------------------
## 
## [1] "c3 vs. c4 , number of degs significant at adj.p.val<0.5: 1"
## 
## --------------------------------------------------------------------------------------------------
##                Row.names                    c3       c4    adj.P.Val               V2             
## ---------------------------------------- --------- ------ ----------- ----------------------------
## wgEncodeBroadHistoneMonocd14ro1746CtcfPk 4.879e-06 0.7054   0.08407   Monocytes CD14+ CTCF Histone
##                                                                       Mods by ChIP-seq Peaks from 
##                                                                               ENCODE/Broad        
## --------------------------------------------------------------------------------------------------
## [1] "Counts of regulatory elements differentially associated with each group"
## 
## ----------------------------
##  &nbsp;   c1   c2   c3   c4 
## -------- ---- ---- ---- ----
##  **c1**   0   116   1    0  
## 
##  **c2**   0    0    0    76 
## 
##  **c3**   0    0    0    1  
## 
##  **c4**   0    0    0    0  
## ----------------------------

Summary

The cluster 2 was the most different from cluster 1 and cluster 4. Disease- and trait associated SNPs from this cluster were enriched in signal B-cells, such as Gm12878 B-cell leukemia and other cells from Gm family of cell types, CD20+ B Lymphocytes yielded most of the association signal

Cell_type Frequency Factor Frequency
gm12878 75 dnasei 45
cd20+ 12 pol2 25
th1 12 h3k4me3 23
gm12892 10 pol2-4h8 11
th2 10 faire 6
dnd41 9 h3k4me2 5
gm12865 9 rna-pet 5
gm12891 9 h3k4me1 5
treg 6 h3k9ac 4
gm06990 5 atf2 4
cd4+ 4 stat5a 4
gm12864 4 h3k27ac 4
gm18505 3 mta3 4
hela-s3 3 runx3 4
gm12875 2 h3k9me3 3
gm19193 2 h2az 3
cd14+ 2 h3k27me3 3
gm18507 2 p300 2
gm15510 2 foxm1 2
th17 1 nfatc1 2
hcm 1 chd1 2
nhdf-neo 1 bclaf1 2
hepg2 1 tblr1 2
nh-a 1 bhlhe40 2
hsmm 1 ctcf 2
k562 1 cnv 2
gm10847 1 pml 2
hct-116 1 nfic 2
raji 1 whip 2
nfkb 2
ebf1 2
h3k79me2 1
ezh2 1
mxi1 1
h4k20me1 1
junctions 1

Co-morbidity similarity analysis

We used the data from Hidalgo CA, Blumm N, Barabasi A-L, Christakis NA. PLoS Computational Biology, 5(4):e1000353 doi:10.1371/journal.pcbi.1000353, available at http://barabasilab.neu.edu/projects/hudine/resource/data/data.html. These data provide co-morbidity measurements among pairs of diseases. We map autoimmune disease- and trait names to 3-digits ICD9 codes and evaluate how co-morbidity measurements correlate with regulatory similarity measurements. We used Phi measurement of co-morbidity. The Spearman correlation coefficient of Phi and regulatory similarity is:

## [1] 0.4130129

Iridescent literature similarity

## [1] "sharedRels correlation with regulatory similarity"
## [1] 0.09761721
## [1] "obsExp correlation with regulatory similarity"
## [1] 0.1953638
## [1] "minMim correlation with regulatory similarity"
## [1] 0.2849605
## [1] "directStr correlation with regulatory similarity"
## [1] 0.2301912
## [1] "relOverlap correlation with regulatory similarity"
## [1] 0.1429562
## [1] "misn correlation with regulatory similarity"
## [1] 0.302173

Analysis of TFBSs

We also performed regulatory similarity analysis using subsets of regulatory datasets, such as Transcription Factor Binding Sites or Histone Modification Marks. Here, out of all regulatory datasets, we selected only TFBSs.

## [1] 1954   39
## [1] 1259   39

Next, we visualized heatmap of regulatory similarity.

and checked how well it correlates with original shared genetic overlap clustering:

## [1] 0.399781

The top 10 pairs of disease-associated SNPs are most similar with each other.

## 
## -----------------------------------------------------------------------------------------------
##                   Disease 1                             Disease 2            Corr. coefficient 
## ---------------------------------------------- ---------------------------- -------------------
##                HDL_cholesterol                        Triglycerides               0.5484       
## 
##                Kawasaki_disease                Systemic_lupus_erythematosus       0.5352       
## 
##              Bone_mineral_density                    Type_2_diabetes              0.5268       
## 
##                Kawasaki_disease                     Multiple_sclerosis             0.501       
## 
##                Kawasaki_disease                    Rheumatoid_arthritis           0.4775       
## 
##                 Celiac_disease                       Kawasaki_disease             0.4754       
## 
##                LDL_cholesterol                        Triglycerides               0.4743       
## 
##                Kawasaki_disease                     Ulcerative_colitis            0.4661       
## 
## Liver_enzyme_levels_gamma_glutamyl_transferase         Urate_levels               0.4191       
## 
##              Alzheimers_combined                   Bone_mineral_density           0.4149       
## -----------------------------------------------------------------------------------------------

The similarity dendrogram can be divided into separate clusters:

## Cluster01 has   8 members 
## Kawasaki_disease
## Systemic_lupus_erythematosus
## Celiac_disease
## Ulcerative_colitis
## Psoriasis
## Multiple_sclerosis
## Rheumatoid_arthritis
## Allergy
##  
## Cluster02 has   9 members 
## Systemic_sclerosis
## Primary_biliary_cirrhosis
## Atopic_dermatitis
## Juvenile_idiopathic_arthritis
## Ankylosing_spondylitis
## Crohns_disease
## Type_1_diabetes
## Autoimmune_thyroiditis
## Primary_sclerosing_cholangitis
##  
## Cluster03 has  10 members 
## Urate_levels
## Liver_enzyme_levels_gamma_glutamyl_transferase
## LDL_cholesterol
## HDL_cholesterol
## Triglycerides
## Renal_function_related_traits_BUN
## Platelet_counts
## Red_blood_cell_traits
## C_reactive_protein
## Fasting_glucose_related_traits
##  
## Cluster04 has  12 members 
## Chronic_kidney_disease
## Alzheimers_combined
## Bone_mineral_density
## Type_2_diabetes
## Vitiligo
## Migraine
## Alopecia_areata
## Asthma
## Creatinine_levels
## Behcets_disease
## Progressive_supranuclear_palsy
## Restless_legs_syndrome
## 

The “Enrichment 1/2” columns show the average p-values of the group-specific SNPs-regulatory associations. A “-” sign indicates that an association is underrepresented. The “p-value” column shows whether the difference in the associations between the groups is statistically significantly different.

## [1] "c1 vs. c2 , number of degs significant at adj.p.val<0.5: 54"
## 
## ---------------------------------------------------------------------------------------------------------
##                     Row.names                         c1       c2    adj.P.Val             V2            
## -------------------------------------------------- --------- ------ ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.0006292 0.8227  0.0001666   GM12878 RUNX3 v042211.1 
##                                                                                 ChIP-seq Peaks Rep 2 from
##                                                                                        ENCODE/HAIB       
## 
##      wgEncodeSydhTfbsGm18951NfkbTnfaIggrabPk       0.008529  0.9186  0.0002231  GM18951 NFKB IgG-rab TNFa
##                                                                                    ChIP-seq Peaks from   
##                                                                                        ENCODE/SYDH       
## 
##  wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1   0.001167  0.6894  0.0002231   GM12878 FOXM1 v042211.1 
##                                                                                 ChIP-seq Peaks Rep 1 from
##                                                                                        ENCODE/HAIB       
## 
##      wgEncodeSydhTfbsGm19099NfkbTnfaIggrabPk       0.008903  0.9673  0.0002231  GM19099 NFKB IgG-rab TNFa
##                                                                                    ChIP-seq Peaks from   
##                                                                                        ENCODE/SYDH       
## 
##  wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2   0.0001479 0.6107  0.0002929    GM12878 PML v042211.1  
##                                                                                 ChIP-seq Peaks Rep 2 from
##                                                                                        ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1  0.0008332 0.8376  0.0002929   GM12878 ATF2 v042211.1  
##                                                                                 ChIP-seq Peaks Rep 1 from
##                                                                                        ENCODE/HAIB       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep1 0.003957  0.942   0.0003906  GM12878 STAT5A v042211.1 
##                                                                                 ChIP-seq Peaks Rep 1 from
##                                                                                        ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1  6.767e-05 0.4192  0.0004104   GM12878 NFIC v042211.1  
##                                                                                 ChIP-seq Peaks Rep 1 from
##                                                                                        ENCODE/HAIB       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.000599  0.6266  0.0004104  GM12878 STAT5A v042211.1 
##                                                                                 ChIP-seq Peaks Rep 2 from
##                                                                                        ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2  0.003147  0.7082  0.0004104   GM12878 ATF2 v042211.1  
##                                                                                 ChIP-seq Peaks Rep 2 from
##                                                                                        ENCODE/HAIB       
## ---------------------------------------------------------------------------------------------------------
## 
## [1] "c1 vs. c3 , number of degs significant at adj.p.val<0.5: 56"
## 
## ----------------------------------------------------------------------------------------------------------
##                     Row.names                         c1       c3     adj.P.Val             V2            
## -------------------------------------------------- --------- ------- ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.0006292 -0.3199  1.525e-06   GM12878 RUNX3 v042211.1 
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1   0.001167  -0.2378  1.978e-06   GM12878 FOXM1 v042211.1 
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1  6.767e-05 -0.1347  3.878e-06   GM12878 NFIC v042211.1  
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##      wgEncodeSydhTfbsGm18951NfkbTnfaIggrabPk       0.008529  -0.6289  5.781e-06  GM18951 NFKB IgG-rab TNFa
##                                                                                     ChIP-seq Peaks from   
##                                                                                         ENCODE/SYDH       
## 
##  wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2   0.0001479 -0.3198  8.173e-06    GM12878 PML v042211.1  
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2  0.003147  -0.3812  8.605e-06   GM12878 ATF2 v042211.1  
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1  0.0008332 -0.3287  8.605e-06   GM12878 ATF2 v042211.1  
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##      wgEncodeSydhTfbsGm19099NfkbTnfaIggrabPk       0.008903  -0.5609  9.529e-06  GM19099 NFKB IgG-rab TNFa
##                                                                                     ChIP-seq Peaks from   
##                                                                                         ENCODE/SYDH       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep1 0.003957  -0.3773  9.887e-06  GM12878 STAT5A v042211.1 
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.000599  -0.4904  1.692e-05  GM12878 STAT5A v042211.1 
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## ----------------------------------------------------------------------------------------------------------
## 
## [1] "c1 vs. c4 , number of degs significant at adj.p.val<0.5: 55"
## 
## -----------------------------------------------------------------------------------------------------------
##                     Row.names                         c1        c4     adj.P.Val             V2            
## -------------------------------------------------- --------- -------- ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.0006292 -0.4205   1.012e-06   GM12878 RUNX3 v042211.1 
##                                                                                   ChIP-seq Peaks Rep 2 from
##                                                                                          ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2   0.0001479  -0.243   3.42e-06     GM12878 PML v042211.1  
##                                                                                   ChIP-seq Peaks Rep 2 from
##                                                                                          ENCODE/HAIB       
## 
##      wgEncodeSydhTfbsGm18951NfkbTnfaIggrabPk       0.008529  -0.6439   3.42e-06   GM18951 NFKB IgG-rab TNFa
##                                                                                      ChIP-seq Peaks from   
##                                                                                          ENCODE/SYDH       
## 
##  wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1   0.001167  -0.5047   3.42e-06    GM12878 FOXM1 v042211.1 
##                                                                                   ChIP-seq Peaks Rep 1 from
##                                                                                          ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1  6.767e-05 -0.2602   3.42e-06    GM12878 NFIC v042211.1  
##                                                                                   ChIP-seq Peaks Rep 1 from
##                                                                                          ENCODE/HAIB       
## 
##      wgEncodeSydhTfbsGm19099NfkbTnfaIggrabPk       0.008903  -0.5014   3.42e-06   GM19099 NFKB IgG-rab TNFa
##                                                                                      ChIP-seq Peaks from   
##                                                                                          ENCODE/SYDH       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.000599  -0.3133   3.42e-06   GM12878 STAT5A v042211.1 
##                                                                                   ChIP-seq Peaks Rep 2 from
##                                                                                          ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2  0.003147  -0.4064   3.42e-06    GM12878 ATF2 v042211.1  
##                                                                                   ChIP-seq Peaks Rep 2 from
##                                                                                          ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1  0.0008332 -0.4112   4.748e-06   GM12878 ATF2 v042211.1  
##                                                                                   ChIP-seq Peaks Rep 1 from
##                                                                                          ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Mta3sc81325V0422111PkRep2  2.011e-05 -0.09226  5.47e-06    GM12878 MTA3 v042211.1  
##                                                                                   ChIP-seq Peaks Rep 2 from
##                                                                                          ENCODE/HAIB       
## -----------------------------------------------------------------------------------------------------------
## [1] "Counts of regulatory elements differentially associated with each group"
## 
## ----------------------------
##  &nbsp;   c1   c2   c3   c4 
## -------- ---- ---- ---- ----
##  **c1**   0    54   56   55 
## 
##  **c2**   0    0    0    0  
## 
##  **c3**   0    0    0    0  
## 
##  **c4**   0    0    0    0  
## ----------------------------

Text mining question 2: Are the terms associated stronger with the diseases in one vs. the other cluster based on the literature strength? Are the terms themselves related based on the literature? Expected answer: Yes, the literature associations should confirm the relationships.

Summary

  1. There are 4 clusters. The first cluster drives all the differences.
C1 C2 C3 C4
C1 Cell types: Gm12878 Reg: NFkB, Pol2, MTA3, NFIC, NFATC1 Cell types: Gm12878 Reg: NFkB, Pol2, MTA3, NFIC, NFATC1 Cell types: Gm12878 Reg: NFkB, Pol2, MTA3, NFIC, NFATC1
C2 Nothing significant Nothing significant
C3 Nothing significant
C4

Analysis of histone marks

Out of all regulatory datasets, we select only histone marks

## [1] 721  39
## [1] 610  39

Next, we visualize heatmap of regulatory similarity.

## [1] 0.27127

The top 10 pairs of autoimmune-associated SNPs are most similar with each other.

## 
## ---------------------------------------------------------------------------------------
##             Disease 1                         Disease 2              Corr. coefficient 
## --------------------------------- --------------------------------- -------------------
##          HDL_cholesterol                    Triglycerides                  0.621       
## 
##       Rheumatoid_arthritis               Ulcerative_colitis               0.4856       
## 
##          HDL_cholesterol                   LDL_cholesterol                 0.48        
## 
##          HDL_cholesterol                   Platelet_counts                0.4609       
## 
##          Platelet_counts                    Triglycerides                 0.4504       
## 
##          LDL_cholesterol                    Triglycerides                 0.4151       
## 
##         Creatinine_levels         Renal_function_related_traits_BUN       0.3915       
## 
##             Psoriasis               Systemic_lupus_erythematosus          0.3911       
## 
## Renal_function_related_traits_BUN           Urate_levels                  0.3689       
## 
##          Alopecia_areata                 C_reactive_protein               0.3686       
## ---------------------------------------------------------------------------------------

The similarity dendrogram can be divided into separate groups:

## Cluster01 has   6 members 
## Celiac_disease
## Multiple_sclerosis
## Kawasaki_disease
## Primary_biliary_cirrhosis
## Systemic_lupus_erythematosus
## Psoriasis
##  
## Cluster02 has  14 members 
## Type_2_diabetes
## Fasting_glucose_related_traits
## Red_blood_cell_traits
## Crohns_disease
## Migraine
## Systemic_sclerosis
## Ankylosing_spondylitis
## Platelet_counts
## Triglycerides
## HDL_cholesterol
## Vitiligo
## Progressive_supranuclear_palsy
## Liver_enzyme_levels_gamma_glutamyl_transferase
## LDL_cholesterol
##  
## Cluster03 has  11 members 
## Allergy
## Type_1_diabetes
## Primary_sclerosing_cholangitis
## Juvenile_idiopathic_arthritis
## Behcets_disease
## Ulcerative_colitis
## Rheumatoid_arthritis
## Autoimmune_thyroiditis
## Alopecia_areata
## C_reactive_protein
## Asthma
##  
## Cluster04 has   8 members 
## Bone_mineral_density
## Chronic_kidney_disease
## Alzheimers_combined
## Restless_legs_syndrome
## Atopic_dermatitis
## Urate_levels
## Renal_function_related_traits_BUN
## Creatinine_levels
## 

The “Enrichment 1/2” columns show the average p-values of the group-specific SNPs-regulatory associations. A “-” sign indicates that an association is underrepresented. The “p-value” column shows whether the difference in the associations bwtween the groups is statistically significantly different.

## [1] "c1 vs. c2 , number of degs significant at adj.p.val<0.5: 44"
## 
## ------------------------------------------------------------------------------------------------------------
##                    Row.names                       c1        c2     adj.P.Val               V2              
## ----------------------------------------------- --------- -------- ----------- -----------------------------
## wgEncodeUwHistoneGm12875H3k04me3StdHotspotsRep1 0.0005212 -0.5773   5.236e-08   GM12875 H3K4me3 Histone Mod 
##                                                                                  ChIP-seq Hotspots 1 from   
##                                                                                          ENCODE/UW          
## 
##     wgEncodeBroadHistoneGm12878H3k9acStdPk      3.849e-12  -0.205   8.601e-07   GM12878 H3K9ac Histone Mods 
##                                                                                   by ChIP-seq Peaks from    
##                                                                                        ENCODE/Broad         
## 
##     wgEncodeBroadHistoneGm12878H3k4me2StdPk     8.782e-09 -0.06392  8.601e-07  GM12878 H3K4me2 Histone Mods 
##                                                                                   by ChIP-seq Peaks from    
##                                                                                        ENCODE/Broad         
## 
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep2 5.452e-07 -0.3131   1.346e-06   GM12865 H3K4me3 Histone Mod 
##                                                                                  ChIP-seq Hotspots 2 from   
##                                                                                          ENCODE/UW          
## 
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep1 2.881e-06 -0.3862   7.309e-06   GM12865 H3K4me3 Histone Mod 
##                                                                                  ChIP-seq Hotspots 1 from   
##                                                                                          ENCODE/UW          
## 
## wgEncodeUwHistoneGm12864H3k04me3StdHotspotsRep2 0.003202  -0.7181   9.149e-06   GM12864 H3K4me3 Histone Mod 
##                                                                                  ChIP-seq Hotspots 2 from   
##                                                                                          ENCODE/UW          
## 
##       wgEncodeBroadHistoneDnd41H3k09acPk        0.0001527 -0.6869   1.924e-05  Dnd41 H3K9ac Histone Mods by 
##                                                                                     ChIP-seq Peaks from     
##                                                                                        ENCODE/Broad         
## 
##   wgEncodeBroadHistoneGm12878H3k04me3StdPkV2    4.708e-08  -0.167   2.222e-05  GM12878 H3K4me3 Histone Mods 
##                                                                                   by ChIP-seq Peaks from    
##                                                                                        ENCODE/Broad         
## 
##    wgEncodeBroadHistoneGm12878H3k79me2StdPk     1.867e-08 -0.9988   3.557e-05  GM12878 H3K79me2 Histone Mods
##                                                                                   by ChIP-seq Peaks from    
##                                                                                        ENCODE/Broad         
## 
##   wgEncodeBroadHistoneGm12878H3k04me1StdPkV2    6.263e-15 -0.01015  3.557e-05  GM12878 H3K4me1 Histone Mods 
##                                                                                   by ChIP-seq Peaks from    
##                                                                                        ENCODE/Broad         
## ------------------------------------------------------------------------------------------------------------
## 
## [1] "c1 vs. c3 , number of degs significant at adj.p.val<0.5: 54"
## 
## -----------------------------------------------------------------------------------------------------------
##                    Row.names                       c1       c3     adj.P.Val               V2              
## ----------------------------------------------- --------- ------- ----------- -----------------------------
## wgEncodeUwHistoneGm12875H3k04me3StdHotspotsRep1 0.0005212 0.8142   1.464e-06   GM12875 H3K4me3 Histone Mod 
##                                                                                 ChIP-seq Hotspots 1 from   
##                                                                                         ENCODE/UW          
## 
##     wgEncodeBroadHistoneGm12878H3k9acStdPk      3.849e-12 0.2791   2.209e-05   GM12878 H3K9ac Histone Mods 
##                                                                                  by ChIP-seq Peaks from    
##                                                                                       ENCODE/Broad         
## 
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep2 5.452e-07 0.6608   3.836e-05   GM12865 H3K4me3 Histone Mod 
##                                                                                 ChIP-seq Hotspots 2 from   
##                                                                                         ENCODE/UW          
## 
##     wgEncodeBroadHistoneGm12878H3k4me2StdPk     8.782e-09 0.5631   5.197e-05  GM12878 H3K4me2 Histone Mods 
##                                                                                  by ChIP-seq Peaks from    
##                                                                                       ENCODE/Broad         
## 
##    wgEncodeBroadHistoneGm12878H3k79me2StdPk     1.867e-08 -0.4995  7.28e-05   GM12878 H3K79me2 Histone Mods
##                                                                                  by ChIP-seq Peaks from    
##                                                                                       ENCODE/Broad         
## 
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep1 2.881e-06 0.7926   7.28e-05    GM12865 H3K4me3 Histone Mod 
##                                                                                 ChIP-seq Hotspots 1 from   
##                                                                                         ENCODE/UW          
## 
## wgEncodeUwHistoneGm12864H3k04me3StdHotspotsRep2 0.003202  0.9054   7.28e-05    GM12864 H3K4me3 Histone Mod 
##                                                                                 ChIP-seq Hotspots 2 from   
##                                                                                         ENCODE/UW          
## 
##       wgEncodeBroadHistoneDnd41H3k09acPk        0.0001527 -0.9118  7.28e-05   Dnd41 H3K9ac Histone Mods by 
##                                                                                    ChIP-seq Peaks from     
##                                                                                       ENCODE/Broad         
## 
##       wgEncodeBroadHistoneDnd41H3k04me1Pk       9.105e-08 -0.7878  0.0001171  Dnd41 H3K4me1 Histone Mods by
##                                                                                    ChIP-seq Peaks from     
##                                                                                       ENCODE/Broad         
## 
## wgEncodeUwHistoneGm06990H3k4me3StdHotspotsRep1  0.0001636 -0.9922  0.0005313   GM06990 H3K4me3 Histone Mod 
##                                                                                 ChIP-seq Hotspots 1 from   
##                                                                                         ENCODE/UW          
## -----------------------------------------------------------------------------------------------------------
## 
## [1] "c1 vs. c4 , number of degs significant at adj.p.val<0.5: 55"
## 
## -------------------------------------------------------------------------------------------------------------
##                    Row.names                       c1        c4      adj.P.Val               V2              
## ----------------------------------------------- --------- --------- ----------- -----------------------------
## wgEncodeUwHistoneGm12875H3k04me3StdHotspotsRep1 0.0005212  -0.3919   2.056e-07   GM12875 H3K4me3 Histone Mod 
##                                                                                   ChIP-seq Hotspots 1 from   
##                                                                                           ENCODE/UW          
## 
##     wgEncodeBroadHistoneGm12878H3k4me2StdPk     8.782e-09 -0.02811   4.143e-06  GM12878 H3K4me2 Histone Mods 
##                                                                                    by ChIP-seq Peaks from    
##                                                                                         ENCODE/Broad         
## 
##     wgEncodeBroadHistoneGm12878H3k9acStdPk      3.849e-12  -0.1383   4.143e-06   GM12878 H3K9ac Histone Mods 
##                                                                                    by ChIP-seq Peaks from    
##                                                                                         ENCODE/Broad         
## 
##    wgEncodeBroadHistoneGm12878H3k79me2StdPk     1.867e-08 -0.005656  5.047e-06  GM12878 H3K79me2 Histone Mods
##                                                                                    by ChIP-seq Peaks from    
##                                                                                         ENCODE/Broad         
## 
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep2 5.452e-07  -0.2947   8.699e-06   GM12865 H3K4me3 Histone Mod 
##                                                                                   ChIP-seq Hotspots 2 from   
##                                                                                           ENCODE/UW          
## 
##   wgEncodeBroadHistoneGm12878H3k04me3StdPkV2    4.708e-08 -0.01621   2.43e-05   GM12878 H3K4me3 Histone Mods 
##                                                                                    by ChIP-seq Peaks from    
##                                                                                         ENCODE/Broad         
## 
## wgEncodeUwHistoneGm12864H3k04me3StdHotspotsRep2 0.003202   -0.5261   2.535e-05   GM12864 H3K4me3 Histone Mod 
##                                                                                   ChIP-seq Hotspots 2 from   
##                                                                                           ENCODE/UW          
## 
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep1 2.881e-06  -0.3212   2.572e-05   GM12865 H3K4me3 Histone Mod 
##                                                                                   ChIP-seq Hotspots 1 from   
##                                                                                           ENCODE/UW          
## 
##       wgEncodeBroadHistoneDnd41H3k09acPk        0.0001527  -0.3194   2.572e-05  Dnd41 H3K9ac Histone Mods by 
##                                                                                      ChIP-seq Peaks from     
##                                                                                         ENCODE/Broad         
## 
##       wgEncodeBroadHistoneDnd41H3k04me1Pk       9.105e-08 -0.06965   3.248e-05  Dnd41 H3K4me1 Histone Mods by
##                                                                                      ChIP-seq Peaks from     
##                                                                                         ENCODE/Broad         
## -------------------------------------------------------------------------------------------------------------
## 
## [1] "c2 vs. c3 , number of degs significant at adj.p.val<0.5: 18"
## 
## -------------------------------------------------------------------------------------------------------
##                 Row.names                     c2       c3      adj.P.Val               V2              
## ------------------------------------------ -------- --------- ----------- -----------------------------
## wgEncodeBroadHistoneA549H3k79me2Dex100nmPk 0.008678 -0.02115    0.01579     A549 DEX 100 nM H3K79me2   
##                                                                             Histone Mods by ChIP-seq   
##                                                                              Peaks from ENCODE/Broad   
## 
##   wgEncodeBroadHistoneHsmmH3k27me3StdPk    -0.02327 3.654e-07   0.02165   HSMM H3K27me3 Histone Mods by
##                                                                                ChIP-seq Peaks from     
##                                                                                   ENCODE/Broad         
## 
##    wgEncodeBroadHistoneNhaH3k27me3StdPk    -0.01506 0.0001926   0.02689   NH-A H3K27me3 Histone Mods by
##                                                                                ChIP-seq Peaks from     
##                                                                                   ENCODE/Broad         
## 
## wgEncodeBroadHistoneA549H3k36me3Dex100nmPk  0.1047  -0.004845   0.02689     A549 DEX 100 nM H3K36me3   
##                                                                             Histone Mods by ChIP-seq   
##                                                                              Peaks from ENCODE/Broad   
## 
##     wgEncodeBroadHistoneNhlfH3k79me2Pk     0.005332 -0.05179    0.03198   NHLF H3K79me2 Histone Mods by
##                                                                                ChIP-seq Peaks from     
##                                                                                   ENCODE/Broad         
## 
##   wgEncodeBroadHistoneK562H3k36me3StdPk    0.001587 -0.009793   0.03505   K562 H3K36me3 Histone Mods by
##                                                                                ChIP-seq Peaks from     
##                                                                                   ENCODE/Broad         
## 
##    wgEncodeBroadHistoneHsmmtH3k09me3Pk     -0.02596 0.003835    0.04333   HSMMtube H3K9me3 Histone Mods
##                                                                              by ChIP-seq Peaks from    
##                                                                                   ENCODE/Broad         
## 
##  wgEncodeBroadHistoneA549H3k27me3Etoh02Pk  -0.01807 0.002645    0.04472     A549 EtOH 0.02% H3K27me3   
##                                                                             Histone Mods by ChIP-seq   
##                                                                              Peaks from ENCODE/Broad   
## 
##    wgEncodeBroadHistoneHsmmtH3k27me3Pk     -0.01922 0.001151    0.04472     HSMMtube H3K27me3 Histone  
##                                                                            Mods by ChIP-seq Peaks from 
##                                                                                   ENCODE/Broad         
## 
##    wgEncodeBroadHistoneNhdfadH4k20me1Pk    0.001681 -0.02737    0.04472   NHDF-Ad H4K20me1 Histone Mods
##                                                                              by ChIP-seq Peaks from    
##                                                                                   ENCODE/Broad         
## -------------------------------------------------------------------------------------------------------
## 
## [1] "c2 vs. c4 , number of degs significant at adj.p.val<0.5: 5"
## 
## -----------------------------------------------------------------------------------------------------
##                Row.names                   c2        c4      adj.P.Val               V2              
## ---------------------------------------- ------- ---------- ----------- -----------------------------
## wgEncodeBroadHistoneNhdfadH3k36me3StdPk  0.1009  -0.005215    0.02027   NHDF-Ad H3K36me3 Histone Mods
##                                                                            by ChIP-seq Peaks from    
##                                                                                 ENCODE/Broad         
## 
##   wgEncodeBroadHistoneNhekH3k9me1StdPk   0.2002  -0.001186    0.02253   NHEK H3K9me1 Histone Mods by 
##                                                                              ChIP-seq Peaks from     
##                                                                                 ENCODE/Broad         
## 
##  wgEncodeBroadHistoneHmecH3k36me3StdPk   0.01773 -0.0002847   0.04273   HMEC H3K36me3 Histone Mods by
##                                                                              ChIP-seq Peaks from     
##                                                                                 ENCODE/Broad         
## 
## wgEncodeBroadHistoneGm12878H3k36me3StdPk 0.2219  -0.001063    0.05838   GM12878 H3K36me3 Histone Mods
##                                                                            by ChIP-seq Peaks from    
##                                                                                 ENCODE/Broad         
## 
##      wgEncodeBroadHistoneK562NcorPk      0.04478 -0.005174    0.0846      K562 NCoR Histone Mods by  
##                                                                              ChIP-seq Peaks from     
##                                                                                 ENCODE/Broad         
## -----------------------------------------------------------------------------------------------------
## [1] "Counts of regulatory elements differentially associated with each group"
## 
## ----------------------------
##  &nbsp;   c1   c2   c3   c4 
## -------- ---- ---- ---- ----
##  **c1**   0    44   54   55 
## 
##  **c2**   0    0    18   5  
## 
##  **c3**   0    0    0    0  
## 
##  **c4**   0    0    0    0  
## ----------------------------

Text mining question 2: Are the terms associated stronger with the diseases in one vs. the other cluster based on the literature strength? Are the terms themselves related based on the literature? Expected answer: Yes, the literature associations should confirm the relationships.

Summary

  1. Again, cluster 1 is strongly distinct. Cluster 2 is less so. Histone marks seem all active.
C1 C2 C3 C4
C1 Cell types: Gm12878, CD20+ Reg: H3K4me1, H3K9me3, H3K9ac, H3K27ac, H2az, H3K4me2 Cell types: Gm12878, CD20+ Reg: H3K4me1, H3K9me3, H3K9ac, H3K27ac, H2az, H3K4me2 Cell types: Gm12878, CD20+ Reg: H3K4me1, H3K9me3, H3K9ac, H3K27ac, H2az, H3K4me2
C2 Cell types: K562, NHEK, NHDF-Ad, NH-A, HMEC Reg: H3K36me3, H4K20me1, H3K79me2 Nothing significant
C3 Nothing significant
C4

Distribution of maxMin correlation coefficients